Search CORE

10 research outputs found

An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators

Author: DG Bobrow
JE Stajich
L Prechelt
Laurent Gautier
MD Robinson
PJ Cock
R Development Core Team
R Knight
RC Gentleman
RCG Holland
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. Results The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of <it>translators</it> required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Conclusions Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: <url>http://pypi.python.org/pypi/rpy2-bioconductor-extensions/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

The p53HMM algorithm: using profile hidden markov models to detect p53-responsive genes

Author: A Contente
A Krogh
AJ Levine
Arnold Levine
B Ma
B Schuster-Böckler
C Barrett
Eduardo Sontag
G Stormo
J Hoh
J Lee
JC Bourdon
M Djordjevic
Q Zhou
R Durbin
R Hughey
RCG Holland
SE Kern
SR Eddy
T Riley
T Tan
TD Schneider
Todd Riley
VD Marinescu
W Hu
WD Funk
WS el Deiry
X Yu
Xin Yu
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A computational method (called p53HMM) is presented that utilizes Profile Hidden Markov Models (PHMMs) to estimate the relative binding affinities of putative p53 response elements (REs), both p53 single-sites and cluster-sites. These models incorporate a novel "Corresponded Baum-Welch" training algorithm that provides increased predictive power by exploiting the redundancy of information found in the repeated, palindromic p53-binding motif. The predictive accuracy of these new models are compared against other predictive models, including position specific score matrices (PSSMs, or weight matrices). We also present a new dynamic acceptance threshold, dependent upon a putative binding site's distance from the Transcription Start Site (TSS) and its estimated binding affinity. This new criteria for classifying putative p53-binding sites increases predictive accuracy by reducing the false positive rate. Results Training a Profile Hidden Markov Model with corresponding positions matching a combined-palindromic p53-binding motif creates the best p53-RE predictive model. The p53HMM algorithm is available on-line: <url>http://tools.csb.ias.edu</url> Conclusion Using Profile Hidden Markov Models with training methods that exploit the redundant information of the homotetramer p53 binding site provides better predictive models than weight matrices (PSSMs). These methods may also boost performance when applied to other transcription factor binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A Quick Guide for Developing Effective Bioinformatics Programming Skills

Author: A Matsunaga
Atul J. Butte
B Smith
DW Mount
Fran Lewitter
H Mangalam
I Bogdan
ITS Li
J Aerts
J Dean
J Kinser
J Kleinjung
J Tisdall
JD Tisdall
JE Stajich
JE Stajich
Joel T. Dudley
K Chaichoompu
K Lee
M Farrar
M Halling-Brown
M Model
M Schatz
MC Schatz
MS Friedrichs
NF Noy
O Bodenreider
PJ Cock
R Chen
R Chen
RA Dwyer
RC Gentleman
RCG Holland
RT Fielding
S Kumar
S Kumar
SB Hedges
T Oliver
T Rognes
T Rognes
Y Gu
Y Liu
YS Dandass
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Bioinformatics programming skills are becoming a necessity across many facets of biology and medicine, owed in part to the continuing explosion of biological dat

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

iPhy: an integrated phylogenetic workbench for supermatrix analyses

Author: A Bateman
A Criscuolo
A de Queiroz
A Rokas
A Stamatakis
ACJ Roth
AJ Drummond
AM Waterhouse
B Kolaczkowski
B Roure
CW Dunn
DR Maddison
EW Sayers
F Ronquist
F Schreiber
G Stoesser
Georgios D Koutsovoulos
H van Megen
I Letunic
J Parkinson
J Parkinson
J Ruan
JC Regier
JJ Wiens
K Meusemann
LY Geer
M Han
M Jones
Mark L Blaxter
Martin O Jones
ML Blaxter
MP Nesnidal
NC Sheffield
P Rice
RC Edgar
RCG Holland
SF Altschul
V Morell
W Li
WR Pearson
X Huang
Y Tateno
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use. Results Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users. Conclusions iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Identification of gene co-regulatory modules and associated cis-elements involved in degenerative heart disease

Author: A Subramanian
AI Su
Arkady M Pertsov
AS Barth
BJ Wilkins
C Danko
C Kioussi
Charles G Danko
DW Jeong
E Segal
F Tan
F Wittchen
G Dennis
H Rindt
H Wakaguri
J Hwang
J Tian
J Wang
JA Towbin
JD Barrans
JL Hall
KA Dellow
LA Megeney
M Flesch
M Gupta
MA Beer
MB Eisen
MM Kittleson
MS Parmacek
MS Parmacek
OV Kel-Margoulis
PK Bhavsar
R Bassel-Duby
R Development Core Team
R Edgar
R Gentleman
R Grzeskowiak
RCG Holland
S Malik
T Sugimoto
TH Christensen
TJP Hubbard
VR Iyer
WE Johnson
X Xie
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Cardiomyopathies, degenerative diseases of cardiac muscle, are among the leading causes of death in the developed world. Microarray studies of cardiomyopathies have identified up to several hundred genes that significantly alter their expression patterns as the disease progresses. However, the regulatory mechanisms driving these changes, in particular the networks of transcription factors involved, remain poorly understood. Our goals are (A) to identify modules of co-regulated genes that undergo similar changes in expression in various types of cardiomyopathies, and (B) to reveal the specific pattern of transcription factor binding sites, <it>cis</it>-elements, in the proximal promoter region of genes comprising such modules. Methods We analyzed 149 microarray samples from human hypertrophic and dilated cardiomyopathies of various etiologies. Hierarchical clustering and Gene Ontology annotations were applied to identify modules enriched in genes with highly correlated expression and a similar physiological function. To discover motifs that may underly changes in expression, we used the promoter regions for genes in three of the most interesting modules as input to motif discovery algorithms. The resulting motifs were used to construct a probabilistic model predictive of changes in expression across different cardiomyopathies. Results We found that three modules with the highest degree of functional enrichment contain genes involved in myocardial contraction (n = 9), energy generation (n = 20), or protein translation (n = 20). Using motif discovery tools revealed that genes in the contractile module were found to contain a TATA-box followed by a CACC-box, and are depleted in other GC-rich motifs; whereas genes in the translation module contain a pyrimidine-rich initiator, Elk-1, SP-1, and a novel motif with a GCGC core. Using a naïve Bayes classifier revealed that patterns of motifs are statistically predictive of expression patterns, with odds ratios of 2.7 (contractile), 1.9 (energy generation), and 5.5 (protein translation). Conclusion We identified patterns comprised of putative <it>cis</it>-regulatory motifs enriched in the upstream promoter sequence of genes that undergo similar changes in expression secondary to cardiomyopathies of various etiologies. Our analysis is a first step towards understanding transcription factor networks that are active in regulating gene expression during degenerative heart disease.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predicting DNA-Binding Specificities of Eukaryotic Transcription Factors

Author: A Juncker
A Kel
A Moll
A Prakash
A Sandelin
A Sarai
Adrian Schröder
AM Leontovich
AM Waterhouse
Andreas Zell
BC Foat
BE Engelhardt
C Bock
C Wrzodek
Carsten Henneges
CJ Harrison
CJ Mungall
CM Bergman
CS Leslie
D Alamanova
D Wilson
D Zhou
DA Rodionov
DE Newburger
Dierk Wanke
DL Wheeler
E Boutet
E Kretschmann
E Wingender
G Badis
H Hegyi
H Li
H Saigo
H Saigo
HG Roider
J Kilian
J Kopp
J Supper
J Zhu
JA Gerlt
JC Bryne
JL Risler
Jochen Supper
Johannes Eichner
Jonas Eichner
JV Turatsinze
K Higo
K Liolios
K Niefind
K Pearson
L Liao
L Narlikar
L Wei
LJ Jensen
M Akerfelt
M Piipari
MA Andrade
MC Teixeira
MO Dayhoff
N Shental
P Baldi
P Bork
P Flicek
P Stegmaier
PH von Hippel
PK Mehta
PV Loo
R Bonneau
R Lüthy
RCG Holland
RV Davuluri
S Aerts
S Henikoff
S Kawashima
S Mahony
S Mahony
S Miyazawa
SB Needleman
SJ Maerkl
T Miyata
Tim J. Hubbard
TM Alleyne
U Gerland
UJ Pape
V Matys
V Matys
XD Liu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central